Application of Proximity Graphs to Editing Nearest Neighbor Decision Rules
نویسندگان
چکیده
Non-parametric decision rules, such as the nearest neighbor (NN) rule, are attractive because no a priori knowledge is required concerning the underlying distributions of the data. Two traditional criticisms directed at the NN-rule concern the large amounts of storage and computation involved due to the apparent necessity to store all the sample (training) data. Thus there has been considerable interest in “editing” or “thinning” the training data in an attempt to store only a fraction of it. Previous editing algorithms suffered from the drawback that they delivered edited sets that were not decision-boundary consistent, i.e., the decision boundary determined by the edited set differed from that specified by the entire original training data. In this paper several geometric methods based on proximity graphs are proposed for editing the training data for use in the NN-rule. Most notably, one of the methods yields a decision-boundary consistent edited set and therefore a decision rule that preserves all the desirable convergence properties of the NN-rule that is based on the original entire training data. The methods are all derived from the Voronoi diagram of the sample data and make use of subgraphs of the Delaunay triangulation. The methods are compared empirically through experiments on synthetic data as well as real world data in the automatic detection of cervical cancer. Finally, algorithms for the efficient implementation of these techniques are discussed.
منابع مشابه
Geometric Decision Rules for Instance-Based Learning Problems
In the typical nonparametric approach to classification in instance-based learning and data mining, random data (the training set of patterns) are collected and used to design a decision rule (classifier). One of the most well known such rules is the k-nearest neighbor decision rule (also known as lazy learning) in which an unknown pattern is classified into the majority class among the k-neare...
متن کاملAsymptotic Properties of Nearest Neighbor Rules Using Edited Data
The convergence properties of a nearest neighbor rule that uses an editing procedure to reduce the number of preclassified samples and to improve the performance of the rule are developed. Editing of the preclassified samples using the three-nearest neighbor rule followed by classification using the single-nearest neighbor rule with the remaining preclassified samples appears to produce a decis...
متن کامل7. Acknowledgment 8. References Cervical Cell Data 6. Concluding Remarks
Non-parametric decision rules, such as the nearest neighbor (NN) rule, are attractive because no a priori knowledge is required concerning the underlying distributions of the data. Two traditional criticisms directed at the NN-rule concern the large amounts of storage and computation involved due to the apparent necessity to store all the sample (training) data. Thus there has been considerable...
متن کاملA Modified Editing k-nearest Neighbor Rule
Classification of objects is an important area in a variety of fields and applications. Many different methods are available to make a decision in those cases. The knearest neighbor rule (k-NN) is a well-known nonparametric decision procedure. Classification rules based on the k-NN have already been proposed and applied in diverse substantive areas. The editing k-NN proposed by Wilson would be ...
متن کامل